Statistical Strategies for Pruning All the Uninteresting Association Rules

نویسنده

Gemma C. Garriga

چکیده

We propose a general framework to formalize the problem of capturing the intensity of implication for association rules through statistical metrics. In this framework we present properties that influence the interestingness of a rule, analyze the conditions that lead a measure to perform a perfect prune at a time, and define a final proper order to sort the surviving rules. We will discuss why none of the currently employed measures can capture objective interestingness, and just the combination of some of them in a multi-step fashion, can be reliable. In contrast, we propose a new simple modification of the Pearson coefficient that will meet all the necessary requirements. We statistically infer the convenient cut-off threshold for this new metric by empirically describing its distribution function through simulation. Experiments show a promising behaviour of our proposal. 1 PROBLEM FORMULATION One of the most relevant tasks in Knowledge Discovery in Databases is mining for association rules in large masses of data, as it was first formulated by [1]. This task is often decomposed into two separate phases: 1/ Finding all the frequent itemsets having support over a user-specified threshold, and, 2/ Generating the association rules from the maximal discovered frequent itemsets. The input of a frequent sets algorithm is a database , composed of a collection of transactions, where each transaction is a subset of a given fixed set of items . Let be an itemset, and let ! "# $ &% be the ratio of the number of transactions in which appears to the number of all transactions in , i.e. ! "# $ '%( *) +-, . /-02143 576 8 5 8 . We note the support of an itemset as ! "# $ '% . An itemset is called frequent if its support exceeds a given user-specified threshold, 9 . In the second phase, association rules are constructed from those maximal frequent sets. In brief, given any maximal frequent itemset : , an association rule is an expression

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interestingness and Pruning of Mined Patterns

We study the following question: when can a mined pattern, which may be an association, a correlation, ratio rule, or any other, be regarded as interesting? Previous approaches to answering this question have been largely numeric. Speciically, we show that the presence of some rules may make others redundant, and therefore uninteresting. We articulate these principles and formalize them in the ...

متن کامل

On Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity

Many studies have shown the limits of support/confidence framework used in Apriori-like algorithms to mine association rules. There are a lot of efficient implementations based on the antimonotony property of the support but candidate set generation is still costly. In addition many rules are uninteresting or redundant and one can miss interesting rules like nuggets. One solution is to get rid ...

متن کامل

Direct Interesting Rule Generation

An association rule generation algorithm usually generates too many rules including a lot of uninteresting ones. Many interestingness criteria are proposed to prune those uninteresting rules. However, they work in post-pruning process and hence do not improve the rule generation ef£ciency. In this paper, we discuss properties of informative rule set and conclude that the informative rule set in...

متن کامل

An Association Rules Survey for Redundancy Reduction and Desired Rules with Ontology

In Data Mining generating an association rules is still an important research issue, the usefulness of association rules is strongly limited by the huge amount of delivered rules. To overcome this drawback, several methods were proposed for the reducing the redundant rules and uninteresting patterns. However, being generally based on statistical information, most of these methods do not guarant...

متن کامل

On pruning strategies for discovery of generalized and quantitative association rules

Mining association rules has become an important datamining task, and meanwhile many algorithms have been developed which often differ in several aspects. In this paper, we analyse and compare the pruning strategies of several algorithms that were designed for mining generalised and quantitative association rules while abstracting from other technical details. Furthermore, we sketch a novel pru...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Statistical Strategies for Pruning All the Uninteresting Association Rules

نویسنده

چکیده

منابع مشابه

Interestingness and Pruning of Mined Patterns

On Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity

Direct Interesting Rule Generation

An Association Rules Survey for Redundancy Reduction and Desired Rules with Ontology

On pruning strategies for discovery of generalized and quantitative association rules

عنوان ژورنال:

اشتراک گذاری